The Effects of Disfluency Detection in Parsing Spoken Language

نویسنده

  • Fredrik Jørgensen
چکیده

Spoken language contains disfluencies that, because of their irregular nature, may lead to reduced performance of data-driven parsers. This paper describes an experiment that quantifies the effects of disfluency detection and disfluency removal on data-driven parsing of spoken language data. The experiment consists of creating two reduced versions from a spoken language treebank, the Switchboard Corpus, mimicking a speechrecognizer output with and without disfluency detection and deletion. Two datadriven parsers are applied on the new data, and the parsers’ output is evaluated and compared.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English

This paper investigates the suitability of state-of-the-art natural language processing (NLP) tools for parsing the spoken language of second language learners of English. The task of parsing spoken learner-language is important to the domains of automated language assessment (ALA) and computer-assisted language learning (CALL). Due to the non-canonical nature of spoken language (containing fil...

متن کامل

Joint Parsing and Disfluency Detection in Linear Time

We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint ta...

متن کامل

Disfluency Detection and Parsing of Transcribed Speech of Estonian

The paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.7% and precision...

متن کامل

Joint Transition-based Dependency Parsing and Disfluency Detection for Automatic Speech Recognition Texts

Joint dependency parsing with disfluency detection is an important task in speech language processing. Recent methods show high performance for this task, although most authors make the unrealistic assumption that input texts are transcribed by human annotators. In real-world applications, the input text is typically the output of an automatic speech recognition (ASR) system, which implies that...

متن کامل

The impact of language models and loss functions on repair disfluency detection

Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007